Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the park/unpark mechanism #528

Merged
merged 4 commits into from Jan 3, 2019
Merged

Rewrite the park/unpark mechanism #528

merged 4 commits into from Jan 3, 2019

Conversation

ghost
Copy link

@ghost ghost commented Aug 7, 2018

The park/unpark mechanism now has identical implementation to std::thread::{park,park_timeout,unpark}.

Our previous implementation was cutting corners in a few places and while it seemed obviously correct, we decided to avoid playing with fire and just copy what std::thread does line by line.

@carllerche
Copy link
Member

Thanks for putting this together.

/cc @jsgf @sfackler We don't actually know if this is the cause of the bug, nor are we able to reproduce the bug. Are you able to test this change to see if it prevents the lost wakeups you have been seeing?

Copy link
Author

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two implementations in tokio-threadpool and tokio-executor are literally the same - copy/paste and return Ok(()) from park in tokio-executor.

NOTIFY => return Ok(()),
IDLE => {},
_ => unreachable!(),
if self.state.compare_exchange(NOTIFY, IDLE, SeqCst, SeqCst).is_ok() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// If the duration is zero, then there is no need to actually block
if let Some(ref dur) = timeout {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is new in tokio-executor. @carllerche, any idea why the previous implementation didn't have it? Maybe just an oversight?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oversight... it would be nice to unify the two impls instead of duplicating code.

}

// The state is currently idle, so obtain the lock and then try to
// transition to a sleeping state.
let mut m = self.mutex.lock().unwrap();

// Transition to sleeping
match self.state.compare_and_swap(IDLE, SLEEP, Ordering::SeqCst) {
NOTIFY => {
match self.state.compare_exchange(IDLE, SLEEP, SeqCst, SeqCst) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}
None => {
loop {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None => self.condvar.wait(m).unwrap(),
match timeout {
Some(timeout) => {
m = self.condvar.wait_timeout(m, timeout).unwrap().0;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokio-executor/src/park.rs Outdated Show resolved Hide resolved
@jsgf
Copy link

jsgf commented Aug 7, 2018

+1 to unifying the implementations.

I'm a little hesitant to test this out though because we don't have a consistent reproducer, nor a good theory about why this change would help with the bug. I'm concerned about this turning into a "changing things until the problem goes away" exercise.

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a thought inline.

// Notified before we could sleep, consume the notification and
// exit
self.state.store(IDLE, Ordering::SeqCst);
self.state.store(IDLE, SeqCst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems problematic for the same reason stated in the issue.

Given that this is intended to acquire all memory from unpark threads, it needs to establish a "happens-before" relationship with

match self.state.compare_exchange(IDLE, NOTIFY, SeqCst, SeqCst) {
.

I do not think that this is done (though, again, MFENCE on x86 will work).

@carllerche
Copy link
Member

@jsgf Well, does reverting back to tokio-threadpool 0.1.4 fix the problem?

@jsgf
Copy link

jsgf commented Aug 7, 2018

@carllerche I've been holding off until we're sure we can get a clear signal. It repros relatively rarely, and it might have been happening for a while at some rate.

I'm considering the possibility that the bug is actually elsewhere, and the changes to park/unpark are just changing the timing or something else so it presents differently.

I'm currently re-learning TLA+ so I can model park/unpark and see if there's something subtle we're all missing - with luck that will turn something up and so we can move forward with some confidence. If it doesn't then either the problem is somewhere else or the model is bad (but model+inspection gives more confidence than inspection alone).

@carllerche
Copy link
Member

@jsgf I agree that I am skeptical the bug is w/ park / unpark.

@carllerche
Copy link
Member

@stjepang This looks good to me, but I am a bit hesitant to merge w/o a repro or at least an understanding as to what the bug is.

Do you have thoughts on ways we might be able to repro?

@jsgf
Copy link

jsgf commented Aug 7, 2018

So far I haven't been able to find a problem with either the 0.1.4 or 0.1.5 park/unpark implementations.

// Notified before we could sleep, consume the notification and
// exit
self.state.store(IDLE, Ordering::SeqCst);
self.state.swap(IDLE, SeqCst);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using swap as a store seems like overkill since the value is discarded. On x86 this will generate an unnecessary locked instruction. (ie, the std version should use store.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the version in std is incorrect. The swap is required to establish ordering with this line in unpark. Without doing a "read" operation here, visibility cannot be acquired.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. I didn't check std, but I assume the swap here came from there.

Also, my understanding is that SeqCst should establish total ordering with other SeqCst loads and stores, so this can be a plain store and still be ordered with respect to compare_exchange. (The std::atomic::Ordering docs are irritatingly imprecise here though, so I don't know if I'm actually right - it seems to imply that SeqCst isn't meaningful for store.)

Copy link
Author

@ghost ghost Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsgf

A store operation with Release ordering (or stronger) writes to memory and allows other future load operations to synchronize with it.

However, here we also want to perform a load operation at the same time (hence swap) in order to synchronize with whoever last wrote to this location (i.e. acquire their writes to memory).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't SeqCst already guarantee that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsgf, from an x86 codegen perspective, I don’t think a SeqCst store or swap is going to matter - it’s going to be some instruction that entails LOCK like behavior for both. In fact, I see xchg reg, mem on the playground for both.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carllerche

Without doing a "read" operation here, visibility cannot be acquired.

Isn't the read at line 257 sufficient?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the read at line 257 sufficient?

I was about to ask the same. The read of NOTIFY with SeqCst is sufficient to establish synchronization with the thread that notified.

Of course release/acquire would be sufficient as well but I am not going to try playing that game again ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because a third thread can come along and unpark while the park fn is between 257 and 262. The bug is specifically that the park thread must acquire the memory from a third thread that calls unpark before the park thread hits line 262.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Continuing in rust-lang/rust#53366)

@tobz
Copy link
Member

tobz commented Sep 6, 2018

Did this ever get anywhere once it ended up becoming a rust-lang discussion? Is the erroneous behavior that led us to believe this was the bug still present?

@ghost
Copy link
Author

ghost commented Sep 13, 2018

In #525, a deadlock was reported where #459 seems like the only meaningful change. Since the issue seems to be so difficult to track down, I think we should merge this PR and go with the same park/unpark implementation that has already been battle-tested in std::thread.

I really wonder if this will fix deadlocks.

@carllerche, what do you think?

@willmo
Copy link

willmo commented Sep 14, 2018

I think unpark() needs to be changed as well. See rust-lang/rust#53366 and rust-lang/rust#54174.

@carllerche
Copy link
Member

@stjepang What do you think about closing this since it seems unlikely that the bug is in our code. Also the std version has evolved since this PR.

@carllerche
Copy link
Member

Or at least updating to track std's latest changes.

@ghost
Copy link
Author

ghost commented Nov 13, 2018

@carllerche I'm going to update to reflect the std's latest implementation.

A question regarding tokio_executor::park and tokio_threadpool::park. They have almost exactly the same implementation, but with a small but crucial difference.

tokio_executor::park::ParkThread is !Send and always refers to the current thread (there's a single thread-local instance called CURRENT_PARK_THREAD), while tokio_threadpool::park::DefaultPark is Send and is a self-contained parking mechanism.

Originally I thought of putting an instance of tokio_executor::park::ParkThread inside tokio_threadpool::park::DefaultPark, but that's not going to work. Is there any way we could unify those two thread parking implementations?

@carllerche
Copy link
Member

@stjepang right, well the one that is stored in the thread local is to avoid having to recreate all the state each time when calling wait. It should just be a question of taking an "instance" version (used by threadpool) and storing that in the thread local. Thoughts?

@ghost
Copy link
Author

ghost commented Nov 19, 2018

It should just be a question of taking an "instance" version (used by threadpool) and storing that in the thread local.

Right. However, tokio-threadpool depends on tokio-executor so we'd have to move the "instance" version (i.e. DefaultPark/DefaultUnpark) into tokio-executor. Should we do that?

@carllerche
Copy link
Member

@stjepang it probably should be in a separate crate to be honest... maybe syncbox once that is ready?

bors bot added a commit to crossbeam-rs/crossbeam that referenced this pull request Nov 29, 2018
235: Add Parker for thread parking r=stjepang a=stjepang

This is just an extracted copy of the current implementation of `thread::park()` and `thread::unpark()`.

`Parker` is a low-level thread synchronization primitive useful for building others (like locks etc.). It would be useful in tokio-rs/tokio#528 and might remove some cruft and unnecessary TLS access from `context.rs` in `crossbeam-channel`.

I've also added a fast-path check for `timeout == Duration::from_secs(0)`, which Tokio relies on.

An interesting peculiarity of `Parker` is that it's not split into `Parker` and `Unparker`, which means any thread can call `park()` at any time. However, if multiple threads call `park()` at the same time, expect deadlocks or panics - that is the user's problem. Splitting the primitive into an owned `Parker` and shared `Unparker` would require us to wrap the inner structure into an `Arc`, which comes at a cost of allocation and indirection. Since this is a very low-level primitive, I chose not to do that.

The reason why this is in `crossbeam-utils` is because it's a really simple primitive with no dependencies. Also, `tokio-executor` and `tokio-threadpool` really don't want to pull in the whole `crossbeam` in order to use this.

cc @carllerche

Co-authored-by: Stjepan Glavina <stjepang@gmail.com>
@ghost
Copy link
Author

ghost commented Dec 8, 2018

I've updated the PR to use Parker from crossbeam-utils, which is just a copy-paste of the proven parking mechanism from std::thread. By relying on this dependency we also got a nice code cleanup.

@ghost
Copy link
Author

ghost commented Dec 15, 2018

@carllerche Just a ping for review :)

@carllerche carllerche merged commit 5e2d93f into tokio-rs:master Jan 3, 2019
@inq inq mentioned this pull request Jan 3, 2019
@ghost ghost deleted the rewrite-wakeup branch January 3, 2019 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants